Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 149999 |
| Missing cells | 3924 |
| Missing cells (%) | 0.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 12.6 MiB |
| Average record size in memory | 88.0 B |
Variable types
| NUM | 10 |
|---|---|
| BOOL | 1 |
Reproduction
| Analysis started | 2020-08-11 03:49:30.890182 |
|---|---|
| Analysis finished | 2020-08-11 03:50:15.055461 |
| Duration | 44.17 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
Unnamed: 0 is highly correlated with df_index | High correlation |
df_index is highly correlated with Unnamed: 0 | High correlation |
NumberOfDependents has 3924 (2.6%) missing values | Missing |
RevolvingUtilizationOfUnsecuredLines is highly skewed (γ1 = 97.63124905) | Skewed |
NumberOfTime30-59DaysPastDueNotWorse is highly skewed (γ1 = 22.59703929) | Skewed |
DebtRatio is highly skewed (γ1 = 95.15750074) | Skewed |
MonthlyIncome is highly skewed (γ1 = 29.41205776) | Skewed |
df_index has unique values | Unique |
Unnamed: 0 has unique values | Unique |
RevolvingUtilizationOfUnsecuredLines has 10878 (7.3%) zeros | Zeros |
NumberOfTime30-59DaysPastDueNotWorse has 126018 (84.0%) zeros | Zeros |
DebtRatio has 4113 (2.7%) zeros | Zeros |
MonthlyIncome has 1634 (1.1%) zeros | Zeros |
NumberOfOpenCreditLinesAndLoans has 1888 (1.3%) zeros | Zeros |
NumberRealEstateLoansOrLines has 56188 (37.5%) zeros | Zeros |
NumberOfDependents has 86902 (57.9%) zeros | Zeros |
| Distinct count | 149999 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 74999.56203041354 |
|---|---|
| Minimum | 0 |
| Maximum | 149999 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 7499.9 |
| Q1 | 37499.5 |
| median | 75000 |
| Q3 | 112499.5 |
| 95-th percentile | 142499.1 |
| Maximum | 149999 |
| Range | 149999 |
| Interquartile range (IQR) | 75000 |
Descriptive statistics
| Standard deviation | 43301.5522 |
|---|---|
| Coefficient of variation (CV) | 0.5773574009 |
| Kurtosis | -1.200010906 |
| Mean | 74999.56203 |
| Median Absolute Deviation (MAD) | 37500 |
| Skewness | -4.231478641e-06 |
| Sum | 1.124985930e+10 |
| Variance | 1875024423 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 113949 | 1 | < 0.1% | |
| 15661 | 1 | < 0.1% | |
| 13612 | 1 | < 0.1% | |
| 3371 | 1 | < 0.1% | |
| 1322 | 1 | < 0.1% | |
| 7465 | 1 | < 0.1% | |
| 5416 | 1 | < 0.1% | |
| 27943 | 1 | < 0.1% | |
| 25894 | 1 | < 0.1% | |
| Other values (149989) | 149989 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 149999 | 1 | < 0.1% | |
| 149998 | 1 | < 0.1% | |
| 149997 | 1 | < 0.1% | |
| 149996 | 1 | < 0.1% | |
| 149995 | 1 | < 0.1% |
| Distinct count | 149999 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 75000.56203041354 |
|---|---|
| Minimum | 1 |
| Maximum | 150000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 7500.9 |
| Q1 | 37500.5 |
| median | 75001 |
| Q3 | 112500.5 |
| 95-th percentile | 142500.1 |
| Maximum | 150000 |
| Range | 149999 |
| Interquartile range (IQR) | 75000 |
Descriptive statistics
| Standard deviation | 43301.5522 |
|---|---|
| Coefficient of variation (CV) | 0.5773497028 |
| Kurtosis | -1.200010906 |
| Mean | 75000.56203 |
| Median Absolute Deviation (MAD) | 37500 |
| Skewness | -4.231478641e-06 |
| Sum | 1.12500093e+10 |
| Variance | 1875024423 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 109855 | 1 | < 0.1% | |
| 11567 | 1 | < 0.1% | |
| 9518 | 1 | < 0.1% | |
| 15661 | 1 | < 0.1% | |
| 13612 | 1 | < 0.1% | |
| 3371 | 1 | < 0.1% | |
| 1322 | 1 | < 0.1% | |
| 7465 | 1 | < 0.1% | |
| 5416 | 1 | < 0.1% | |
| Other values (149989) | 149989 | > 99.9% |
| Value | Count | Frequency (%) | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 150000 | 1 | < 0.1% | |
| 149999 | 1 | < 0.1% | |
| 149998 | 1 | < 0.1% | |
| 149997 | 1 | < 0.1% | |
| 149996 | 1 | < 0.1% |
SeriousDlqin2yrs
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 MiB |
| 0 | |
|---|---|
| 1 | 10026 |
| Value | Count | Frequency (%) | |
| 0 | 139973 | 93.3% | |
| 1 | 10026 | 6.7% |
| Distinct count | 125728 |
|---|---|
| Unique (%) | 83.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.048471711145627 |
|---|---|
| Minimum | 0.0 |
| Maximum | 50708.0 |
| Zeros | 10878 |
| Zeros (%) | 7.3% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.029866918 |
| median | 0.154175793 |
| Q3 | 0.5590437525 |
| 95-th percentile | 0.9999999 |
| Maximum | 50708 |
| Range | 50708 |
| Interquartile range (IQR) | 0.5291768345 |
Descriptive statistics
| Standard deviation | 249.7562028 |
|---|---|
| Coefficient of variation (CV) | 41.29244787 |
| Kurtosis | 14544.61645 |
| Mean | 6.048471711 |
| Median Absolute Deviation (MAD) | 0.148322474 |
| Skewness | 97.63124905 |
| Sum | 907264.7082 |
| Variance | 62378.16084 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 10878 | 7.3% | |
| 0.9999999 | 10255 | 6.8% | |
| 1 | 17 | < 0.1% | |
| 0.9500998 | 8 | < 0.1% | |
| 0.71314741 | 6 | < 0.1% | |
| 0.007984032 | 6 | < 0.1% | |
| 0.954091816 | 6 | < 0.1% | |
| 0.796407186 | 5 | < 0.1% | |
| 0.850299401 | 5 | < 0.1% | |
| 0.538922156 | 5 | < 0.1% | |
| Other values (125718) | 128808 | 85.9% |
| Value | Count | Frequency (%) | |
| 0 | 10878 | 7.3% | |
| 8.37e-06 | 1 | < 0.1% | |
| 9.93e-06 | 1 | < 0.1% | |
| 1.25e-05 | 1 | < 0.1% | |
| 1.43e-05 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 50708 | 1 | < 0.1% | |
| 29110 | 1 | < 0.1% | |
| 22198 | 1 | < 0.1% | |
| 22000 | 1 | < 0.1% | |
| 20514 | 1 | < 0.1% |
age
Real number (ℝ≥0)
| Distinct count | 85 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 52.295555303702024 |
|---|---|
| Minimum | 21 |
| Maximum | 109 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 21 |
|---|---|
| 5-th percentile | 29 |
| Q1 | 41 |
| median | 52 |
| Q3 | 63 |
| 95-th percentile | 78 |
| Maximum | 109 |
| Range | 88 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 14.77129796 |
|---|---|
| Coefficient of variation (CV) | 0.2824580001 |
| Kurtosis | -0.4953320655 |
| Mean | 52.2955553 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0.1892426318 |
| Sum | 7844281 |
| Variance | 218.1912435 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 49 | 3837 | 2.6% | |
| 48 | 3806 | 2.5% | |
| 50 | 3753 | 2.5% | |
| 47 | 3719 | 2.5% | |
| 63 | 3719 | 2.5% | |
| 46 | 3714 | 2.5% | |
| 53 | 3648 | 2.4% | |
| 51 | 3627 | 2.4% | |
| 52 | 3609 | 2.4% | |
| 56 | 3589 | 2.4% | |
| Other values (75) | 112978 | 75.3% |
| Value | Count | Frequency (%) | |
| 21 | 183 | 0.1% | |
| 22 | 434 | 0.3% | |
| 23 | 641 | 0.4% | |
| 24 | 816 | 0.5% | |
| 25 | 953 | 0.6% |
| Value | Count | Frequency (%) | |
| 109 | 2 | < 0.1% | |
| 107 | 1 | < 0.1% | |
| 105 | 1 | < 0.1% | |
| 103 | 3 | < 0.1% | |
| 102 | 3 | < 0.1% |
| Distinct count | 16 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4210294735298235 |
|---|---|
| Minimum | 0 |
| Maximum | 98 |
| Zeros | 126018 |
| Zeros (%) | 84.0% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 2 |
| Maximum | 98 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 4.192794982 |
|---|---|
| Coefficient of variation (CV) | 9.958435799 |
| Kurtosis | 522.3732593 |
| Mean | 0.4210294735 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 22.59703929 |
| Sum | 63154 |
| Variance | 17.57952976 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 126018 | 84.0% | |
| 1 | 16032 | 10.7% | |
| 2 | 4598 | 3.1% | |
| 3 | 1754 | 1.2% | |
| 4 | 747 | 0.5% | |
| 5 | 342 | 0.2% | |
| 98 | 264 | 0.2% | |
| 6 | 140 | 0.1% | |
| 7 | 54 | < 0.1% | |
| 8 | 25 | < 0.1% | |
| Other values (6) | 25 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 126018 | 84.0% | |
| 1 | 16032 | 10.7% | |
| 2 | 4598 | 3.1% | |
| 3 | 1754 | 1.2% | |
| 4 | 747 | 0.5% |
| Value | Count | Frequency (%) | |
| 98 | 264 | 0.2% | |
| 96 | 5 | < 0.1% | |
| 13 | 1 | < 0.1% | |
| 12 | 2 | < 0.1% | |
| 11 | 1 | < 0.1% |
| Distinct count | 114193 |
|---|---|
| Unique (%) | 76.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 353.0074262338636 |
|---|---|
| Minimum | 0.0 |
| Maximum | 329664.0 |
| Zeros | 4113 |
| Zeros (%) | 2.7% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.004329004 |
| Q1 | 0.1750736325 |
| median | 0.366503221 |
| Q3 | 0.8682570065 |
| 95-th percentile | 2449 |
| Maximum | 329664 |
| Range | 329664 |
| Interquartile range (IQR) | 0.693183374 |
Descriptive statistics
| Standard deviation | 2037.825113 |
|---|---|
| Coefficient of variation (CV) | 5.772754229 |
| Kurtosis | 13734.20232 |
| Mean | 353.0074262 |
| Median Absolute Deviation (MAD) | 0.245720104 |
| Skewness | 95.15750074 |
| Sum | 52950760.93 |
| Variance | 4152731.19 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 4113 | 2.7% | |
| 1 | 229 | 0.2% | |
| 4 | 174 | 0.1% | |
| 2 | 170 | 0.1% | |
| 3 | 162 | 0.1% | |
| 5 | 143 | 0.1% | |
| 9 | 125 | 0.1% | |
| 10 | 117 | 0.1% | |
| 7 | 115 | 0.1% | |
| 13 | 114 | 0.1% | |
| Other values (114183) | 144537 | 96.4% |
| Value | Count | Frequency (%) | |
| 0 | 4113 | 2.7% | |
| 2.6e-05 | 1 | < 0.1% | |
| 3.69e-05 | 1 | < 0.1% | |
| 3.93e-05 | 1 | < 0.1% | |
| 6.62e-05 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 329664 | 1 | < 0.1% | |
| 326442 | 1 | < 0.1% | |
| 307001 | 1 | < 0.1% | |
| 220516 | 1 | < 0.1% | |
| 168835 | 1 | < 0.1% |
| Distinct count | 13584 |
|---|---|
| Unique (%) | 9.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5587.847976319842 |
|---|---|
| Minimum | 0.0 |
| Maximum | 500000.0 |
| Zeros | 1634 |
| Zeros (%) | 1.1% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 820 |
| Q1 | 1800 |
| median | 4357 |
| Q3 | 7400 |
| 95-th percentile | 13500 |
| Maximum | 500000 |
| Range | 500000 |
| Interquartile range (IQR) | 5600 |
Descriptive statistics
| Standard deviation | 7778.687841 |
|---|---|
| Coefficient of variation (CV) | 1.392072203 |
| Kurtosis | 1608.138244 |
| Mean | 5587.847976 |
| Median Absolute Deviation (MAD) | 2557 |
| Skewness | 29.41205776 |
| Sum | 838171608.6 |
| Variance | 60507984.52 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1800 | 12748 | 8.5% | |
| 1328.6 | 12346 | 8.2% | |
| 820 | 5260 | 3.5% | |
| 5000 | 2757 | 1.8% | |
| 4000 | 2106 | 1.4% | |
| 6000 | 1933 | 1.3% | |
| 3000 | 1758 | 1.2% | |
| 0 | 1634 | 1.1% | |
| 2500 | 1551 | 1.0% | |
| 10000 | 1466 | 1.0% | |
| Other values (13574) | 106440 | 71.0% |
| Value | Count | Frequency (%) | |
| 0 | 1634 | 1.1% | |
| 1 | 605 | 0.4% | |
| 2 | 6 | < 0.1% | |
| 4 | 2 | < 0.1% | |
| 5 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 500000 | 12 | < 0.1% | |
| 440000 | 1 | < 0.1% | |
| 428250 | 1 | < 0.1% | |
| 408333 | 1 | < 0.1% | |
| 324000 | 1 | < 0.1% |
| Distinct count | 58 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.452776351842346 |
|---|---|
| Minimum | 0 |
| Maximum | 58 |
| Zeros | 1888 |
| Zeros (%) | 1.3% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 8 |
| Q3 | 11 |
| 95-th percentile | 18 |
| Maximum | 58 |
| Range | 58 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 5.145964246 |
|---|---|
| Coefficient of variation (CV) | 0.6087898262 |
| Kurtosis | 3.091028799 |
| Mean | 8.452776352 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 1.215303679 |
| Sum | 1267908 |
| Variance | 26.48094802 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 6 | 13613 | 9.1% | |
| 7 | 13245 | 8.8% | |
| 5 | 12931 | 8.6% | |
| 8 | 12562 | 8.4% | |
| 4 | 11609 | 7.7% | |
| 9 | 11355 | 7.6% | |
| 10 | 9624 | 6.4% | |
| 3 | 9058 | 6.0% | |
| 11 | 8321 | 5.5% | |
| 12 | 7005 | 4.7% | |
| Other values (48) | 40676 | 27.1% |
| Value | Count | Frequency (%) | |
| 0 | 1888 | 1.3% | |
| 1 | 4438 | 3.0% | |
| 2 | 6666 | 4.4% | |
| 3 | 9058 | 6.0% | |
| 4 | 11609 | 7.7% |
| Value | Count | Frequency (%) | |
| 58 | 1 | < 0.1% | |
| 57 | 2 | < 0.1% | |
| 56 | 2 | < 0.1% | |
| 54 | 4 | < 0.1% | |
| 53 | 1 | < 0.1% |
| Distinct count | 28 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.0182334548896992 |
|---|---|
| Minimum | 0 |
| Maximum | 54 |
| Zeros | 56188 |
| Zeros (%) | 37.5% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 54 |
| Range | 54 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.129771907 |
|---|---|
| Coefficient of variation (CV) | 1.109541139 |
| Kurtosis | 60.47710052 |
| Mean | 1.018233455 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 3.482511689 |
| Sum | 152734 |
| Variance | 1.276384562 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 56188 | 37.5% | |
| 1 | 52338 | 34.9% | |
| 2 | 31521 | 21.0% | |
| 3 | 6300 | 4.2% | |
| 4 | 2170 | 1.4% | |
| 5 | 689 | 0.5% | |
| 6 | 320 | 0.2% | |
| 7 | 171 | 0.1% | |
| 8 | 93 | 0.1% | |
| 9 | 78 | 0.1% | |
| Other values (18) | 131 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 56188 | 37.5% | |
| 1 | 52338 | 34.9% | |
| 2 | 31521 | 21.0% | |
| 3 | 6300 | 4.2% | |
| 4 | 2170 | 1.4% |
| Value | Count | Frequency (%) | |
| 54 | 1 | < 0.1% | |
| 32 | 1 | < 0.1% | |
| 29 | 1 | < 0.1% | |
| 26 | 1 | < 0.1% | |
| 25 | 3 | < 0.1% |
| Distinct count | 9 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 3924 |
| Missing (%) | 2.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.7569946945062468 |
|---|---|
| Minimum | 0.0 |
| Maximum | 8.0 |
| Zeros | 86902 |
| Zeros (%) | 57.9% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.113064882 |
|---|---|
| Coefficient of variation (CV) | 1.470373426 |
| Kurtosis | 2.216483236 |
| Mean | 0.7569946945 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.542175099 |
| Sum | 110578 |
| Variance | 1.238913432 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 86902 | 57.9% | |
| 1 | 26316 | 17.5% | |
| 2 | 19521 | 13.0% | |
| 3 | 9483 | 6.3% | |
| 4 | 2862 | 1.9% | |
| 5 | 746 | 0.5% | |
| 6 | 158 | 0.1% | |
| 7 | 51 | < 0.1% | |
| 8 | 36 | < 0.1% | |
| (Missing) | 3924 | 2.6% |
| Value | Count | Frequency (%) | |
| 0 | 86902 | 57.9% | |
| 1 | 26316 | 17.5% | |
| 2 | 19521 | 13.0% | |
| 3 | 9483 | 6.3% | |
| 4 | 2862 | 1.9% |
| Value | Count | Frequency (%) | |
| 8 | 36 | < 0.1% | |
| 7 | 51 | < 0.1% | |
| 6 | 158 | 0.1% | |
| 5 | 746 | 0.5% | |
| 4 | 2862 | 1.9% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | Unnamed: 0 | SeriousDlqin2yrs | RevolvingUtilizationOfUnsecuredLines | age | NumberOfTime30-59DaysPastDueNotWorse | DebtRatio | MonthlyIncome | NumberOfOpenCreditLinesAndLoans | NumberRealEstateLoansOrLines | NumberOfDependents | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 1 | 0.766127 | 45 | 2 | 0.802982 | 9120.0 | 13 | 6 | 2.0 |
| 1 | 1 | 2 | 0 | 0.957151 | 40 | 0 | 0.121876 | 2600.0 | 4 | 0 | 1.0 |
| 2 | 2 | 3 | 0 | 0.658180 | 38 | 1 | 0.085113 | 3042.0 | 2 | 0 | 0.0 |
| 3 | 3 | 4 | 0 | 0.233810 | 30 | 0 | 0.036050 | 3300.0 | 5 | 0 | 0.0 |
| 4 | 4 | 5 | 0 | 0.907239 | 49 | 1 | 0.024926 | 63588.0 | 7 | 1 | 0.0 |
| 5 | 5 | 6 | 0 | 0.213179 | 74 | 0 | 0.375607 | 3500.0 | 3 | 1 | 1.0 |
| 6 | 6 | 7 | 0 | 0.305682 | 57 | 0 | 5710.000000 | 1800.0 | 8 | 3 | 0.0 |
| 7 | 7 | 8 | 0 | 0.754464 | 39 | 0 | 0.209940 | 3500.0 | 8 | 0 | 0.0 |
| 8 | 8 | 9 | 0 | 0.116951 | 27 | 0 | 46.000000 | 820.0 | 2 | 0 | NaN |
| 9 | 9 | 10 | 0 | 0.189169 | 57 | 0 | 0.606291 | 23684.0 | 9 | 4 | 2.0 |
Last rows
| df_index | Unnamed: 0 | SeriousDlqin2yrs | RevolvingUtilizationOfUnsecuredLines | age | NumberOfTime30-59DaysPastDueNotWorse | DebtRatio | MonthlyIncome | NumberOfOpenCreditLinesAndLoans | NumberRealEstateLoansOrLines | NumberOfDependents | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 149989 | 149990 | 149991 | 0 | 0.055518 | 46 | 0 | 0.609779 | 4335.0 | 7 | 1 | 2.0 |
| 149990 | 149991 | 149992 | 0 | 0.104112 | 59 | 0 | 0.477658 | 10316.0 | 10 | 2 | 0.0 |
| 149991 | 149992 | 149993 | 0 | 0.871976 | 50 | 0 | 4132.000000 | 1800.0 | 11 | 1 | 3.0 |
| 149992 | 149993 | 149994 | 0 | 1.000000 | 22 | 0 | 0.000000 | 820.0 | 1 | 0 | 0.0 |
| 149993 | 149994 | 149995 | 0 | 0.385742 | 50 | 0 | 0.404293 | 3400.0 | 7 | 0 | 0.0 |
| 149994 | 149995 | 149996 | 0 | 0.040674 | 74 | 0 | 0.225131 | 2100.0 | 4 | 1 | 0.0 |
| 149995 | 149996 | 149997 | 0 | 0.299745 | 44 | 0 | 0.716562 | 5584.0 | 4 | 1 | 2.0 |
| 149996 | 149997 | 149998 | 0 | 0.246044 | 58 | 0 | 3870.000000 | 1800.0 | 18 | 1 | 0.0 |
| 149997 | 149998 | 149999 | 0 | 0.000000 | 30 | 0 | 0.000000 | 5716.0 | 4 | 0 | 0.0 |
| 149998 | 149999 | 150000 | 0 | 0.850283 | 64 | 0 | 0.249908 | 8158.0 | 8 | 2 | 0.0 |